Skip to main content

Conjugate Direction Methods

Code

Please visit this page for a code implementation of the conjugate gradiant method.

Conjugate Direction Methods

The class of conjugate direction methods can be viewed as being intermediate between the method of steepest descent and Newton's method. The conjugate direction methods have the following properties:

  1. Solve quadratics of nn variables in nn steps.
  2. The usual implementation, the conjugate gradient algorithm, requires no Hessian matrix evaluations.
  3. No matrix inversion and no storage of an n×nn \times n matrix are required. The conjugate direction methods typically perform better than the method of steepest descent, but not as well as Newton's method. As we saw from the method of steepest descent and Newton's method, the crucial factor in the efficiency of an iterative search method is the direction of search at each iteration. For a quadratic function of nn variables f(x)=12xQxxbf(\boldsymbol{x})=\frac{1}{2} \boldsymbol{x}^{\top} \boldsymbol{Q} \boldsymbol{x}-\boldsymbol{x}^{\top} \boldsymbol{b}, xRn,Q=Q>0\boldsymbol{x} \in \mathbb{R}^n, \boldsymbol{Q}=\boldsymbol{Q}^{\top}>0, the best direction of search, as we shall see, is in the Q\boldsymbol{Q}-conjugate direction. Basically, two directions d(1)\boldsymbol{d}^{(1)} and d(2)\boldsymbol{d}^{(2)} in Rn\mathbb{R}^n are said to be Q\boldsymbol{Q}-conjugate if d(1)Qd(2)=0\boldsymbol{d}^{(1) \top} \boldsymbol{Q} \boldsymbol{d}^{(2)}=0. In general, we have the following definition.

Definition

Let QQ be a real symmetric n×nn \times n matrix. The directions d(0),d(1),d(2),,d(m)\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \boldsymbol{d}^{(2)}, \ldots, \boldsymbol{d}^{(m)} are Q\boldsymbol{Q}-conjugate if for all iji \neq j, we have d(i)Qd(j)=\boldsymbol{d}^{(i) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}= 0.

Lemma 1

Let QQ be a symmetric positive definite n×nn \times n matrix. If the directions d(0),d(1),,d(k)Rn,kn1\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(k)} \in \mathbb{R}^n, k \leq n-1, are nonzero and Q\boldsymbol{Q}-conjugate, then they are linearly independent.

Proof

Let α0,,αk\alpha_0, \ldots, \alpha_k be scalars such that

α0d(0)+α1d(1)++αkd(k)=0.\alpha_0 \boldsymbol{d}^{(0)}+\alpha_1 \boldsymbol{d}^{(1)}+\cdots+\alpha_k \boldsymbol{d}^{(k)}=\mathbf{0} .

Premultiplying this equality by d(j)Q,0jk\boldsymbol{d}^{(j) \top} \boldsymbol{Q}, 0 \leq j \leq k, yields

αjd(j)Qd(j)=0,\alpha_j \boldsymbol{d}^{(j) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=0,

because all other terms d(j)Qd(i)=0,ij\boldsymbol{d}^{(j) \top} \boldsymbol{Q} \boldsymbol{d}^{(i)}=0, i \neq j, by Q\boldsymbol{Q}-conjugacy. But Q=Q>0\boldsymbol{Q}=\boldsymbol{Q}^{\top}>0 and d(j)0\boldsymbol{d}^{(j)} \neq \mathbf{0}; hence αj=0,j=0,1,,k\alpha_j=0, j=0,1, \ldots, k. Therefore, d(0),d(1),,d(k),kn1\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(k)}, k \leq n-1, are linearly independent.

The Conjugate Direction Algorithm

We now present the conjugate direction algorithm for minimizing the quadratic function of nn variables

f(x)=12xQxxb,f(\boldsymbol{x})=\frac{1}{2} \boldsymbol{x}^{\top} \boldsymbol{Q} \boldsymbol{x}-\boldsymbol{x}^{\top} \boldsymbol{b},

where Q=Q>0,xRn\boldsymbol{Q}=\boldsymbol{Q}^{\top}>0, \boldsymbol{x} \in \mathbb{R}^n. Note that because Q>0\boldsymbol{Q}>0, the function ff has a global minimizer that can be found by solving Qx=b\boldsymbol{Q} \boldsymbol{x}=\boldsymbol{b}.

Basic Conjugate Direction Algorithm

Given a starting point x(0)\boldsymbol{x}^{(0)} and Q\boldsymbol{Q}-conjugate directions d(0),d(1),,d(n1)\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(n-1)}; for k0k \geq 0,

g(k)=f(x(k))=Qx(k)b,αk=g(k)d(k)d(k)Qd(k),x(k+1)=x(k)+αkd(k).\begin{aligned} \boldsymbol{g}^{(k)} &=\nabla f\left(\boldsymbol{x}^{(k)}\right)=\boldsymbol{Q} \boldsymbol{x}^{(k)}-\boldsymbol{b}, \\ \alpha_k &=-\frac{\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(k)}}{\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}}, \\ \boldsymbol{x}^{(k+1)} &=\boldsymbol{x}^{(k)}+\alpha_k \boldsymbol{d}^{(k)} . \end{aligned}

Theorem 1

For any starting point x(0)\boldsymbol{x}^{(0)}, the basic conjugate direction algorithm converges to the unique x\boldsymbol{x}^* (that solves Qx=b\boldsymbol{Q} \boldsymbol{x}=\boldsymbol{b} ) in nn steps; that is, x(n)=x\boldsymbol{x}^{(n)}=\boldsymbol{x}^*.

Proof

Consider xx(0)Rn\boldsymbol{x}^*-\boldsymbol{x}^{(0)} \in \mathbb{R}^n. Because the d(i)\boldsymbol{d}^{(i)} are linearly independent, there exist constants βi,i=0,,n1\beta_i, i=0, \ldots, n-1, such that

xx(0)=β0d(0)++βn1d(n1).\boldsymbol{x}^*-\boldsymbol{x}^{(0)}=\beta_0 \boldsymbol{d}^{(0)}+\cdots+\beta_{n-1} \boldsymbol{d}^{(n-1)} .

Now premultiply both sides of this equation by d(k)Q,0k<n\boldsymbol{d}^{(k) \top} \boldsymbol{Q}, 0 \leq k<n, to obtain

d(k)Q(xx(0))=βkd(k)Qd(k),\boldsymbol{d}^{(k) \top} \boldsymbol{Q}\left(\boldsymbol{x}^*-\boldsymbol{x}^{(0)}\right)=\beta_k \boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)},

where the terms d(k)Qd(i)=0,ki\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(i)}=0, k \neq i, by the Q\boldsymbol{Q}-conjugate property. Hence,

βk=d(k)Q(xx(0))d(k)Qd(k).\beta_k=\frac{\boldsymbol{d}^{(k) \top} \boldsymbol{Q}\left(\boldsymbol{x}^*-\boldsymbol{x}^{(0)}\right)}{\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}} .

Now, we can write

x(k)=x(0)+α0d(0)++αk1d(k1).\boldsymbol{x}^{(k)}=\boldsymbol{x}^{(0)}+\alpha_0 \boldsymbol{d}^{(0)}+\cdots+\alpha_{k-1} \boldsymbol{d}^{(k-1)} .

Therefore,

x(k)x(0)=α0d(0)++αk1d(k1).\boldsymbol{x}^{(k)}-\boldsymbol{x}^{(0)}=\alpha_0 \boldsymbol{d}^{(0)}+\cdots+\alpha_{k-1} \boldsymbol{d}^{(k-1)} .

So writing

xx(0)=(xx(k))+(x(k)x(0))\boldsymbol{x}^*-\boldsymbol{x}^{(0)}=\left(\boldsymbol{x}^*-\boldsymbol{x}^{(k)}\right)+\left(\boldsymbol{x}^{(k)}-\boldsymbol{x}^{(0)}\right)

and premultiplying the above by d(k)Q\boldsymbol{d}^{(k) \top} \boldsymbol{Q}, we obtain

d(k)Q(xx(0))=d(k)Q(xx(k))=d(k)g(k),\boldsymbol{d}^{(k) \top} \boldsymbol{Q}\left(\boldsymbol{x}^*-\boldsymbol{x}^{(0)}\right)=\boldsymbol{d}^{(k) \top} \boldsymbol{Q}\left(\boldsymbol{x}^*-\boldsymbol{x}^{(k)}\right)=-\boldsymbol{d}^{(k) \top} \boldsymbol{g}^{(k)},

because g(k)=Qx(k)b\boldsymbol{g}^{(k)}=\boldsymbol{Q} \boldsymbol{x}^{(k)}-\boldsymbol{b} and Qx=b\boldsymbol{Q} \boldsymbol{x}^*=\boldsymbol{b}. Thus,

βk=d(k)g(k)d(k)Qd(k)=αk\beta_k=-\frac{\boldsymbol{d}^{(k) \top} \boldsymbol{g}^{(k)}}{\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}}=\alpha_k

and x=x(n)\boldsymbol{x}^*=\boldsymbol{x}^{(n)}, which completes the proof.

Lemma 2

In the conjugate direction algorithm,

g(k+1)d(i)=0\boldsymbol{g}^{(k+1)^{\top}} \boldsymbol{d}^{(i)}=0

for all k,0kn1k, 0 \leq k \leq n-1, and 0ik0 \leq i \leq k.

Proof

Note that

Q(x(k+1)x(k))=Qx(k+1)b(Qx(k)b)=g(k+1)g(k),\boldsymbol{Q}\left(\boldsymbol{x}^{(k+1)}-\boldsymbol{x}^{(k)}\right)=\boldsymbol{Q} \boldsymbol{x}^{(k+1)}-\boldsymbol{b}-\left(\boldsymbol{Q} \boldsymbol{x}^{(k)}-\boldsymbol{b}\right)=\boldsymbol{g}^{(k+1)}-\boldsymbol{g}^{(k)},

because g(k)=Qx(k)b\boldsymbol{g}^{(k)}=\boldsymbol{Q} \boldsymbol{x}^{(k)}-\boldsymbol{b}. Thus,

g(k+1)=g(k)+αkQd(k).\boldsymbol{g}^{(k+1)}=\boldsymbol{g}^{(k)}+\alpha_k \boldsymbol{Q} \boldsymbol{d}^{(k)} .

We prove the lemma by induction. The result is true for k=0k=0 because g(1)d(0)=0\boldsymbol{g}^{(1) \top} \boldsymbol{d}^{(0)}=0, as shown before. We now show that if the result is true for k1k-1 (i.e., g(k)d(i)=0,ik1\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(i)}=0, i \leq k-1 ), then it is true for kk (i.e., g(k+1)d(i)=0\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(i)}=0, iki \leq k ). Fix k>0k>0 and 0i<k0 \leq i<k. By the induction hypothesis, g(k)d(i)=0\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(i)}=0. Because

g(k+1)=g(k)+αkQd(k),\boldsymbol{g}^{(k+1)}=\boldsymbol{g}^{(k)}+\alpha_k \boldsymbol{Q} \boldsymbol{d}^{(k)},

and d(k)Qd(i)=0\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(i)}=0 by Q\boldsymbol{Q}-conjugacy, we have

g(k+1)d(i)=g(k)d(i)+αkd(k)Qd(i)=0.\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(i)}=\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(i)}+\alpha_k \boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(i)}=0 .

It remains to be shown that

g(k+1)d(k)=0.\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(k)}=0 .

Indeed,

g(k+1)d(k)=(Qx(k+1)b)d(k)=(x(k)g(k)d(k)d(k)Qd(k)d(k))Qd(k)bd(k)=(Qx(k)b)d(k)g(k)d(k)=0,\begin{aligned} \boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(k)} &=\left(\boldsymbol{Q} \boldsymbol{x}^{(k+1)}-\boldsymbol{b}\right)^{\top} \boldsymbol{d}^{(k)} \\ &=\left(\boldsymbol{x}^{(k)}-\frac{\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(k)}}{\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}} \boldsymbol{d}^{(k)}\right)^{\top} \boldsymbol{Q} \boldsymbol{d}^{(k)}-\boldsymbol{b}^{\top} \boldsymbol{d}^{(k)} \\ &=\left(\boldsymbol{Q} \boldsymbol{x}^{(k)}-\boldsymbol{b}\right)^{\top} \boldsymbol{d}^{(k)}-\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(k)} \\ &=0, \end{aligned}

because Qx(k)b=g(k)\boldsymbol{Q} \boldsymbol{x}^{(k)}-\boldsymbol{b}=\boldsymbol{g}^{(k)}. Therefore, by induction, for all 0kn10 \leq k \leq n-1 and 0ik0 \leq i \leq k,

g(k+1)d(i)=0.\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(i)}=0 .

The Conjugate Gradient Algorithm

The conjugate gradient algorithm does not use prespecified conjugate directions, but instead computes the directions as the algorithm progresses. At each stage of the algorithm, the direction is calculated as a linear combination of the previous direction and the current gradient, in such a way that all the directions are mutually QQ-conjugate - hence the name conjugate gradient algorithm. This calculation exploits the fact that for a quadratic function of nn variables, we can locate the function minimizer by performing nn searches along mutually conjugate directions.
As before, we consider the quadratic function

f(x)=12xQxxb,xRnf(\boldsymbol{x})=\frac{1}{2} \boldsymbol{x}^{\top} \boldsymbol{Q} \boldsymbol{x}-\boldsymbol{x}^{\top} \boldsymbol{b}, \quad \boldsymbol{x} \in \mathbb{R}^n

where Q=Q>0\boldsymbol{Q}=\boldsymbol{Q}^{\top}>0. Our first search direction from an initial point x(0)\boldsymbol{x}^{(0)} is in the direction of steepest descent; that is,

d(0)=g(0).d^{(0)}=-g^{(0)} .

Thus,

x(1)=x(0)+α0d(0),\boldsymbol{x}^{(1)}=\boldsymbol{x}^{(0)}+\alpha_0 \boldsymbol{d}^{(0)},

where

α0=argminα0f(x(0)+αd(0))=g(0)d(0)d(0)Qd(0).\alpha_0=\underset{\alpha \geq 0}{\arg \min } f\left(\boldsymbol{x}^{(0)}+\alpha \boldsymbol{d}^{(0)}\right)=-\frac{\boldsymbol{g}^{(0) \top} \boldsymbol{d}^{(0)}}{\boldsymbol{d}^{(0) \top} \boldsymbol{Q} \boldsymbol{d}^{(0)}} .

In the next stage, we search in a direction d(1)\boldsymbol{d}^{(1)} that is Q\boldsymbol{Q}-conjugate to d(0)\boldsymbol{d}^{(0)}. We choose d(1)\boldsymbol{d}^{(1)} as a linear combination of g(1)\boldsymbol{g}^{(1)} and d(0)\boldsymbol{d}^{(0)}. In general, at the (k+1)(k+1) th step, we choose d(k+1)\boldsymbol{d}^{(k+1)} to be a linear combination of g(k+1)\boldsymbol{g}^{(k+1)} and d(k)\boldsymbol{d}^{(k)}. Specifically, we choose

d(k+1)=g(k+1)+βkd(k),k=0,1,2,\boldsymbol{d}^{(k+1)}=-\boldsymbol{g}^{(k+1)}+\beta_k \boldsymbol{d}^{(k)}, \quad k=0,1,2, \ldots

The coefficients βk,k=1,2,\beta_k, k=1,2, \ldots, are chosen in such a way that d(k+1)\boldsymbol{d}^{(k+1)} is Q\boldsymbol{Q}-conjugate to d(0),d(1),,d(k)\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(k)}. This is accomplished by choosing βk\beta_k to be

βk=g(k+1)Qd(k)d(k)Qd(k).\beta_k=\frac{\boldsymbol{g}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}}{\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}} .

The conjugate gradient algorithm is summarized below.

  1. Set k:=0k:=0; select the initial point x(0)\boldsymbol{x}^{(0)}.
  2. g(0)=f(x(0))\boldsymbol{g}^{(0)}=\nabla f\left(\boldsymbol{x}^{(0)}\right). If g(0)=0\boldsymbol{g}^{(0)}=\mathbf{0}, stop; else, set d(0)=g(0)\boldsymbol{d}^{(0)}=-\boldsymbol{g}^{(0)}.
  3. αk=g(k)d(k)d(k)Qd(k)\alpha_k=-\frac{\boldsymbol{g}^{(k) \top} \boldsymbol{d}^{(k)}}{\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}}.
  4. x(k+1)=x(k)+αkd(k)\boldsymbol{x}^{(k+1)}=\boldsymbol{x}^{(k)}+\alpha_k \boldsymbol{d}^{(k)}.
  5. g(k+1)=f(x(k+1))\boldsymbol{g}^{(k+1)}=\nabla f\left(\boldsymbol{x}^{(k+1)}\right). If g(k+1)=0\boldsymbol{g}^{(k+1)}=\mathbf{0}, stop.
  6. βk=g(k+1)Qd(k)d(k)Qd(k)\beta_k=\frac{g^{(k+1) \top} Q d^{(k)}}{d^{(k) \top} Q d^{(k)}}.
  7. d(k+1)=g(k+1)+βkd(k)\boldsymbol{d}^{(k+1)}=-\boldsymbol{g}^{(k+1)}+\beta_k \boldsymbol{d}^{(k)}.
  8. Set k:=k+1k:=k+1; go to step 3.

Proposition

In the conjugate gradient algorithm, the directions d(0),d(1),,d(n1)\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(n-1)} are Q\boldsymbol{Q}-conjugate.

Proof

We use induction. We first show that d(0)Qd(1)=0\boldsymbol{d}^{(0) \top} \boldsymbol{Q} \boldsymbol{d}^{(1)}=0. To this end we write

d(0)Qd(1)=d(0)Q(g(1)+β0d(0))\boldsymbol{d}^{(0) \top} \boldsymbol{Q} \boldsymbol{d}^{(1)}=\boldsymbol{d}^{(0) \top} \boldsymbol{Q}\left(-\boldsymbol{g}^{(1)}+\beta_0 \boldsymbol{d}^{(0)}\right)

Substituting for

β0=g(1)Qd(0)d(0)Qd(0)\beta_0=\frac{\boldsymbol{g}^{(1) \top} \boldsymbol{Q} \boldsymbol{d}^{(0)}}{\boldsymbol{d}^{(0) \top} \boldsymbol{Q} \boldsymbol{d}^{(0)}}

in the equation above, we see that d(0)Qd(1)=0\boldsymbol{d}^{(0) \top} \boldsymbol{Q} \boldsymbol{d}^{(1)}=0. We now assume that d(0),d(1),,d(k),k<n1\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(k)}, k<n-1, are Q\boldsymbol{Q}-conjugate directions. From Lemma 10.210.2 we have g(k+1)d(j)=0,j=0,1,,k\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(j)}=0, j=0,1, \ldots, k. Thus, g(k+1)\boldsymbol{g}^{(k+1)} is orthogonal to each of the directions d(0),d(1),,d(k)\boldsymbol{d}^{(0)}, \boldsymbol{d}^{(1)}, \ldots, \boldsymbol{d}^{(k)}. We now show that

g(k+1)g(j)=0,j=0,1,,k.\boldsymbol{g}^{(k+1) \top} \boldsymbol{g}^{(j)}=0, \quad j=0,1, \ldots, k .

Fix j{0,,k}j \in\{0, \ldots, k\}. We have

d(j)=g(j)+βj1d(j1).\boldsymbol{d}^{(j)}=-\boldsymbol{g}^{(j)}+\beta_{j-1} \boldsymbol{d}^{(j-1)} .

Substituting this equation into the previous one yields

g(k+1)d(j)=0=g(k+1)g(j)+βj1g(k+1)d(j1)\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(j)}=0=-\boldsymbol{g}^{(k+1) \top} \boldsymbol{g}^{(j)}+\beta_{j-1} \boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(j-1)}

Because g(k+1)d(j1)=0\boldsymbol{g}^{(k+1) \top} \boldsymbol{d}^{(j-1)}=0, it follows that g(k+1)g(j)=0\boldsymbol{g}^{(k+1) \top} \boldsymbol{g}^{(j)}=0. We are now ready to show that d(k+1)Qd(j)=0,j=0,,k\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=0, j=0, \ldots, k. We have

d(k+1)Qd(j)=(g(k+1)+βkd(k))Qd(j).\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=\left(-\boldsymbol{g}^{(k+1)}+\beta_k \boldsymbol{d}^{(k)}\right)^{\top} \boldsymbol{Q} \boldsymbol{d}^{(j)} .

If j<kj<k, then d(k)Qd(j)=0\boldsymbol{d}^{(k) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=0, by virtue of the induction hypothesis. Hence, we have

d(k+1)Qd(j)=g(k+1)Qd(j).\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=-\boldsymbol{g}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)} .

But g(j+1)=g(j)+αjQd(j)\boldsymbol{g}^{(j+1)}=\boldsymbol{g}^{(j)}+\alpha_j \boldsymbol{Q} \boldsymbol{d}^{(j)}. Because g(k+1)g(i)=0,i=0,,k\boldsymbol{g}^{(k+1) \top} \boldsymbol{g}^{(i)}=0, i=0, \ldots, k,

d(k+1)Qd(j)=g(k+1)(g(j+1)g(j))αj=0.\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=-\boldsymbol{g}^{(k+1) \top} \frac{\left(\boldsymbol{g}^{(j+1)}-\boldsymbol{g}^{(j)}\right)}{\alpha_j}=0 .

Thus,

d(k+1)Qd(j)=0,j=0,,k1.\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(j)}=0, \quad j=0, \ldots, k-1 .

It remains to be shown that d(k+1)Qd(k)=0\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}=0. We have

d(k+1)Qd(k)=(g(k+1)+βkd(k))Qd(k).\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}=\left(-\boldsymbol{g}^{(k+1)}+\beta_k \boldsymbol{d}^{(k)}\right)^{\top} \boldsymbol{Q} \boldsymbol{d}^{(k)} .

Using the expression for βk\beta_k, we get d(k+1)Qd(k)=0\boldsymbol{d}^{(k+1) \top} \boldsymbol{Q} \boldsymbol{d}^{(k)}=0, which completes the proof.